开普勒和苔丝任务产生了超过100,000个潜在的传输信号,必须处理,以便创建行星候选的目录。在过去几年中,使用机器学习越来越感兴趣,以分析这些数据以寻找新的外延网。与现有的机器学习作品不同,exoMiner,建议的深度学习分类器在这项工作中,模仿域专家如何检查诊断测试以VET传输信号。 exoMiner是一种高度准确,可说明的和强大的分类器,其中1)允许我们验证来自桅杆开口存档的301个新的外延网,而2)是足够的,足以应用于诸如正在进行的苔丝任务的任务中应用。我们进行了广泛的实验研究,以验证exoMiner在不同分类和排名指标方面比现有的传输信号分类器更可靠,准确。例如,对于固定精度值为99%,exoMiner检索测试集中的93.6%的所有外产网(即,召回= 0.936),而最佳现有分类器的速率为76.3%。此外,exoMiner的模块化设计有利于其解释性。我们介绍了一个简单的解释性框架,提供了具有反馈的专家,为什么exoMiner将运输信号分类为特定类标签(例如,行星候选人或不是行星候选人)。
translated by 谷歌翻译
由于其物理能力,模拟的类人动物是一个吸引人的研究领域。尽管如此,他们也在控制方面具有挑战性,因为政策必须推动不稳定,不连续和高维物理系统。一种经过广泛研究的方法是利用运动捕获(MOCAP)数据来教授类人动物的低水平技能(例如,站立,步行和跑步),然后可以重新使用以综合高级行为。但是,即使使用MOCAP数据,控制模拟的类人动物仍然非常困难,因为MOCAP数据仅提供运动学信息。寻找物理控制输入以实现所示动作需要计算密集型方法,例如增强学习。因此,尽管有公开可用的MOCAP数据,但其效用仍限于具有大规模计算的机构。在这项工作中,我们通过训练和释放高质量的代理,可以大大降低有关该主题的生产研究的障碍,这些代理可以在基于DM_Control物理学的环境中跟踪三个小时的MOCAP数据以上的MOCAP数据。我们释放Mocapact(动作动作捕获),这些专家代理的数据集及其推出,其中包含本体感受观察和动作。我们通过使用它来训练单个层次结构策略来证明MOCAPACT的实用性,该策略能够跟踪DM_Control中的整个MOCAP数据集并显示学习学到的低级组件可以被重新使用以有效地学习下游高级任务。最后,我们使用MoCapact训练自动回旋GPT模型,并表明它可以控制模拟的类人动物以在运动提示下执行自然运动完成。结果和指向代码和数据集的链接的视频可在https://microsoft.github.io/mocapact上获得。
translated by 谷歌翻译
Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. We introduce the FlexiBiT framework, which provides a unified way to specify models which can be trained on many different sequential decision making tasks. We show that a single FlexiBiT model is simultaneously capable of carrying out many tasks with performance similar to or better than specialized models. Additionally, we show that performance can be further improved by fine-tuning our general model on specific tasks of interest.
translated by 谷歌翻译
Deep Reinforcement Learning has yielded proficient controllers for complex tasks. However, these controllers have limited memory and rely on being able to perceive the complete game screen at each decision point. To address these shortcomings, this article investigates the effects of adding recurrency to a Deep Q-Network (DQN) by replacing the first post-convolutional fully-connected layer with a recurrent LSTM. The resulting Deep Recurrent Q-Network (DRQN), although capable of seeing only a single frame at each timestep, successfully integrates information through time and replicates DQN's performance on standard Atari games and partially observed equivalents featuring flickering game screens. Additionally, when trained with partial observations and evaluated with incrementally more complete observations, DRQN's performance scales as a function of observability. Conversely, when trained with full observations and evaluated with partial observations, DRQN's performance degrades less than DQN's. Thus, given the same length of history, recurrency is a viable alternative to stacking a history of frames in the DQN's input layer and while recurrency confers no systematic advantage when learning to play the game, the recurrent net can better adapt at evaluation time if the quality of observations changes.
translated by 谷歌翻译
Convolutional neural networks (CNNs) have been extensively applied for image recognition problems giving stateof-the-art results on recognition, detection, segmentation and retrieval. In this work we propose and evaluate several deep neural network architectures to combine image information across a video over longer time periods than previously attempted. We propose two methods capable of handling full length videos. The first method explores various convolutional temporal feature pooling architectures, examining the various design choices which need to be made when adapting a CNN for this task. The second proposed method explicitly models the video as an ordered sequence of frames. For this purpose we employ a recurrent neural network that uses Long Short-Term Memory (LSTM) cells which are connected to the output of the underlying CNN. Our best networks exhibit significant performance improvements over previously published results on the Sports 1 million dataset (73.1% vs. 60.9%) and the UCF-101 datasets with (88.6% vs. 88.0%) and without additional optical flow information (82.6% vs. 73.0%).
translated by 谷歌翻译